Meeting Pearls 4

home *** CD-ROM | disk | FTP | other *** search

/ Meeting Pearls 4 / Meeting Pearls Vol. IV (1996)(GTI - Schatztruhe)[!].iso / Pearls / dev / C-Lib / APurify / Doc / MIT-APurify.doc < prev next >

Wrap

Text File | 1996-01-07 | 26KB | 607 lines

MIT-APurify v1.3 ---------------- MIT-syntax version (GCC). (c) by Samuel DEVULDER jan. 1996 Samuel.Devulder@info.unicaen.fr DESCRIPTION (SHORT): -------------------- This is APurify for compilers with MIT syntax asm-files. As far as I know only GCC uses such a syntax. So that version is indeed a version for the GCC compiler. If you are using an other compiler, then read MOT-APurify instead. In the following of that document, APurify stands for MIT-APurify, and I assume you're using the GCC compiler. APurify is a program that allows you to detect bad accesses to memory of your programs without any kind of specific external devices (MMU). It avoids bugs due to accessing memory not owned by your program. INSTALLATION: ------------ That archive contains the version of APurify for the GCC compiler as well for other compilers. Here is a description of gcc-related files of this archive for that version. It also gives you what to do with those files to make an installation. - doc/MIT-APurify.doc The file you are currently reading. Put it with all your doc files. It is usefull from time to time. - doc/History The whole history. (this file is not very usefull for common people). Do whatever you want with it. - bin/MIT-APurify The parser tuned for the MIT syntax. Rename it as APurify and put it someware in your path. - lib/APur-gcc.a The link-time library. Rename it as APur.a and put it someware in your library search-path. - test/test.c Source of a stupid test file. Just here to let you remake the test program. Do whatever you want with it. - test/test.gcc Test file Apurify'ed. Run it to see how APurify is useful :-). SYNOPSIS: -------- Usage: APurify [-revinfo] <inputfile> [options] Where options can be: ? To display this usage -h To display this usage -? To display this usage -tb To test memory referenced through base register -ts To test memory referenced through stack register -tl To test memory referenced through local stack frame -tp To test pea instructions -o arg Specifies output file (def=%s) -br arg Sets the base register (def=A4) -mp arg Sets the main entry-point (def=_main) Options can be anywhere on the command line. NOTE: They can nomore be merged together, they must be separated by a space. You can pre-define them with the environment variable AP_MITP_OPT. For example, if you do: CLI> SetEnv AP_MITP_OPT "-tb -br A5" Then, when "-tb -br A5" will automatically be added to the command line. The space between an option and its argument can be ommited. Thus "-br A4" is the same as "-brA4". Here is a description of arguments and flags: -revinfo This displays informations about APurify (name, size and date of modules and number of compilation done for that version). -br arg This sets the base register used to reference memory in SMALL_DATA model. Usually A4 is used for that perpose and that's the default. If A5 is used instead then add -brA5 on your command line. -tb This enable APurify to check all referenced memory through the base register (see -br). If you are using a SMALL_DATA model, add this flag on your command line. By default, APurify won't check memory referenced through the base register. NOTE: for safest check, you should always use that option, even if you're not in smalldata model (A4 may be used as a temporary register in that case). To allow this, you can use the environment variable. -ts This enable APurify to check memory referenced by stack pointer (SP or A7). By default APurify won't check such memory accesses (to reduce the code size and increase the runtime speed). That option will detect when you have no more room on your stack (stack overflow). -tl This enable APurify to check memory referenced by local stack pointer (the one that is link'ed and unlink'ed when enterring and exiting a C-function). By default, this is switch off. This option allow APurify to detect stack overflow. -tp This enable APurify to check indirect adresses pushed onto the stack by using a pea. By default this is off. When used, that option will check things like "pea a2@(10)" or the like. This can help you with memory accessed by a pointer in a code that has not been APurify'ed. For example this is usefull for things like fread(&ptr[10],10,1,fp) because in that case the "pea a2@(10)" used to push on the stack &ptr[10] will be checked and if ptr[10] is not owned by your program, you'll get an APurify error. Please note that this may no work all the time since &ptr[0] can be translated as "movel a0,sp@-" which won't be checked. -o arg This specifies the name of the outputfile. If ommited the outputfile will be the same as the inputfile (source file). The name of the output file can be defined by a real name or a pattern. A pattern is a string where special sequences of characters (called specifier) are replaced by special strings. Let's suppose that inputfile is equal to drive:path/file.ext Here is a description of specifiers: %s will be replaced by the full source name: drive:path/file.ext %S will expand to the full source name without the extension: drive:path/file %b stands for the full basename: file.ext %B is a shortcut for the full basename without the extension: file %p is the path (ending "/" or ":" is included): drive:path/ %e is the extension ("." is ommited): ext Thus, if you put "-o ram:%B-apurify.%e" in the commandline, then the outputfile will be "ram:file-apurify.ext" with our example. -mp arg This tells APurify which label should be considered as the entry-point. By default it is set to "_main", and it should not be modified. -? -h ? Obvious options. DESCRIPTION (A BIT LONGER): -------------------------- As a general rule, at the microprocessor level, there is two kind of ways to access memory. There is direct access and indirect access to memory. For example, in C, direct access can be viewed as accessing to global variables. Indirect access corresponds to accessing an array value. More precisely, direct access corresponds to reading or writing a variable whose address is known at compilation time (or since the loading of the program into the memory). Indirect access is used for variables whose adress is dynamicaly determined by the program. For example, if p is a pointer to an array allocated by malloc(), *p is an indirect access. Such an access occur also in case of instruction like T[i] where T is a global array, because the address of T[i] is not known at compilation time, since it depends on the index value i. Using indirect access to memory is called indirection. A regular program must not access memory not owned by it. That kind of access can be qualified as illegal. Illegal direct access to memory is not possible, because by definition, only global variables can be accessed that way and those variables belongs obviously to the program (except for code written in assembly language that references absolute values, for example: "btst #6,$bfe001"; but that kind of code is not a good programming :-)). So we can assume that direct access to memory is always right. On the other hand, it is sure that indirect access to memory can be illegal. Many bugs are made by overstepping array boundaries. If that oversteppings are in reading a value, there is not much trouble for over running tasks (it is an error inside your task); but if it is in writing you may directly interfere with other tasks and big mess can happen (total breakdown of the system). APurify works on that kind of access by verifying the validity of indirect access to memory. It remebers the memory that was allocated by the program and check the integrity of each access. One can think that makes a lot of tests ! Well, yes, but APurify is not designed to be used in the general use of programs; just in test phases. Moreover, indirections do no occur very often actually. Only array-based variables produces indirections. Thus, the variables on the stack --although being accessed by indirection-- are not checked because their access is always safe (at least if there is no stack overflow !). Also, in SMALL_DATA model, global variables access is done through indirection, but they are not checked. If an illegal access is found, APurify displays an error message on the error stream of the program by default. There is two kind of illegal accesses. Some are accesses to memory that doesn't belong to the program (it is called an access between blocks), some others are accesses to a part of memory owned by a program and an other part not owned by it (it is an overstepping of a block). You can see this visually: If [ 1 ] and [ 2 ] represent two blocks allocated by the program and ( 3 ) the memory accessed, then ---- [ 1 ] ---- ( 3 ) ---- [ 2 ] ----> 0 increasing address corresponds to the first kind of illegal access and ---- [ 1 ( ] 3 ) ---- [ 2 ] -----> or ---- [ 1 ] ---- ( 3 [ ) 2 ] -----> corresonds to the second kind of access. The first kind is very common but the second is quite rare (it's rather a misaligment problem). APurify has two output modes. One is verbose an tries to give lot of informations by using words. The other one is more brief and gives you the same informations but you'll have to decode them. When APurify starts and ends, it outputs the date/time. This is useful if you are using logfiles. With that, you can keep all your logs in a single file and retrieve any execution with it's date of execution. In case of an error, APurify displays some text. The first line looks like this one: **** APURIFY ERROR ! [$<N1>(<N2>) <ATTR> (<TEXT1>)] <TEXT2>: That line represent the accessed memory. <N1> is the hexadecimal address accessed. <N2> is the length of the access (in decimal). <ATTR> represents the type of acess. <TEXT1> allows you to find where in your code the illegal accessed had happened. <TEXT2> describe the kind of illegal access. If the length (<N1>) is 1, then it was a byte access. 2 stands for a short access, 4 for a int/long and >4 for movem instruction. Attributes, <ATTR>, can be "R--" or "-W-". The first one represents an access in reading a value and the second an access in writing a value. The text <TEXT1> look like this: <NAME>, PC=$<PC#> HUNK=$<HUNK#> OFFSET=$<OFF#> <NAME> is the name of the subroutine where the error occured. It is always displayed (even if it is a "static" one). The rest of the line can be partially displayed, showing as much informations as APurify can get. <PC#> is a hexadecimal address pointing to the instruction that produced the error. <HUNK#> and <OFF#> are the hunk number and the relative offset of <PC#>. Using <HUNK#> and <OFF#> and a disassembler, you can very easilly find where your code is bad (BTW, I use dobj from netdcc, (c) by Matt Dillon). Please note that in this new version, <PC#> will nomore point to some instruction before the faultly one. It is always the real faultly adress. The remaining lines show the context of the illegal access. It gives you informations about the surronding memory blocks owned by your program. Each block is displayed according to the following pattern: [$<N1>(<N2>) <ATTR> (<TEXT>)] where <N1> is the hexadecimal address of the beginning of the block, <N2> its length (in decimal). Note that the length may seem to be longer than the one allocated by malloc() and the address may point before the one you obtained via malloc(). This is not wrong ! In fact you must know that the malloc() subroutine may add some informations (like an double-chained list or the length of the allocation) to the block you've requested. Those extra informations are put before the address you recieve. That explain this behavior. In this version of APur.lib, this takes 12 ($C) extra bytes. So if you allocate 10 bytes, don't be suprised if APurify thinks you've requested 22 bytes. <ATTR> are 3 status characters RWS where R means: read-enable block W means: write-enable block S means: system block (block not controlled by the program). If one access is forbidden, the letter '-' replaces the corresponding character. <TEXT> is actually the name of the procedure that has allocated the block. With each block you can find an offset. That offset is the distance between that block and the faultly address. In verbose mode, you can see some text explaining things about the relative position of a block and the accessed memory. In non-verbose mode you can just see the offsets followed by the blocks. The shorter offset is displayed first since that block is the one that is more likely overstepped. When an illegal writing occur (the only dangerous thing you can do by indirection, indeed), a requester opens to tell you about that. With that requester, you can stop your program to prevent the deadly error to really happen. If you wish so, exit() is called. You can also ignore that error or ignore all such errors (but then you'll surely meet the guru !). APurify checks the memory allocated but not freed by the program. (in fact, it detects non deallocated-blocks on library-closing time). It knows about memory location independant of the program execution. That is to say, the first kilobyte of memory that contains interrupt vectors of the 680x0 processor, the program segments and the stack. Accessing to those blocks will be illegal. They got the S attribute (for SYSTEM blocks). It takes into account memory block allocated by malloc() and AllocMem(), and indirect allocated block (by OpenScreen() for example). But I did not test the last kind of allocation. Anyway, it should be ok, because APurify patches AllocMem() & FreeMem() entries. Thus a program can access to the bitplanes of one of its screen without error. If the program makes a legal access, but attributes are incompatible with the access-kind, a protection-error message is displayed. Actually only the first kilobyte is read/write-protected. But it may change in the future. HOW TO USE APURIFY: ------------------ One can see APurify as a pre-assembler. It must be used on assembly language sourcefile just before the assembler takes place. It scan the file and change it a bit so that APur.a can be used. Normal way to use it for a C program is to: - compile C sourcefiles and leave assembly language source (.s). - use APurify on each .s file. - compile your .s file to get a .o file - link all .o files together with APur.a. For example, using gcc on prog.c it gives CLI> gcc -g prog.c -o prog.s -S CLI> APurify -tb prog.s CLI> gcc -g prog.s -o prog -lAPur As you can see, APurify needs no change to your C files to be used. In this realease you need no more to call AP_Init() in the main() function. The call is automatically inserted when the main-entry label (specified by -mp) is found. You shoud not use dos.library/Exit() to abort your program, I think it'll crash if APurify is running. If you must use Exit() then call AP_Close() just before calling Exit(). The explantion is simple: since some system functions are patched, if a program exits without closing the library, those patch will be corruped, pointing to a code that is nomore in memory and you'll meet the guru (ie: the computer will crash)... (You've been warned :-). You can disable/enable printing of messages by making a call to AP_Report(flag). If flag is true (ie. different from zero) then printing is enabled, if it is false (ie. equal to zero), no output will be done. This is usefull for startup-codes. For example, if you are using the argv[] array in C, APurify will make a lot of false-error printing. This is because the values pointed by this array is allocated before the library is opened. You can avoid this by calling AP_Report(0) before (and AP_Report(1) after) the code that uses argv[]. When debugging an APurify'ed program, you can put a breakpoint on a function called AP_Err(). That function AP_Err() is called each time APurify detects an error. With that, you'll have the occasion to look at your program just before a faultly memory-access occur. You can switch from a verbose output to a shorter one with AP_Verbose(flag). IF flag is true then the verbose mode is on. If it is false then only short messages will be printed. Some people prefer the later so that is the default. If you perfer the verbose ouput then put AP_Verbose(1) someware in your code and you'll get some longer explanations about illegal accesses. You can specify a logfile where APurify can put its errors. To do this, set the environment variable "APlog" (file ENV:APlog) to a name of a logfile. If this variable is set, then APurify will append all its outputs to the file indicated. If this variable does not exists, then the standard error stream is used. EXAMPLE: ------- As an example, let's look at the test program compiled with gcc-2.6.0. You'll see how you can use the APurify report it produces to find what's wrong in the program. For this, I've included in that document the commented report. My comments/explanations appear on lines beginning with a "#". **** APurify started on Thu Jan 4 23:03:58 1996 # # Well, the report started... # **** APURIFY ERROR ! [$0026defc(4) R-- (_main, PC=$0027eef0 HUNK=$0 OFFSET=$410)] accessed between: -25 [$0026df18(27) RW- (_main)] +1405 [$0026d920(96) RWS (segment Module CLI)] # # Hum... First hit... it is an error in reading something in the main() # procedure between two blocks already allocated. The nearest block # appears in first position, so we can think that the error was done by # accessing an array allocated in main() with a negative index. We can # look at the code to find what is wrong with it. Using DOBJ, we found # at offset $410 in the first hunk the following code: # # 00.00000410 24ab ffd8 MOVE.L -40(A3),(A2) # # This corresponds to the C code: # # a[0]=b[-10] # # Hence we've discovered a first error in the code. Note that -25 is # the distance (in bytes) between the end of the accessed memory and # the beginning of the array. This is not the difference between the # beginning address of the two blocks! # **** APURIFY ERROR ! [$00245f20(4) R-- (_main, PC=$0027ef1a HUNK=$0 OFFSET=$43a)] accessed between: +1 [$00245f10(16) RW- (_main)] -162301 [$0026d920(96) RWS (segment Module CLI)] # # Well... here it seems to be an access just after an allocated block. # the offset +1 is the distance in bytes between the accessed block and # a allocated block. The situation is like this: # # ---------[ 1 ]( 2 )----------> # # Where "[ 1 ]" is the allocated block and "( 2 )" the accessed block. # If we look in the code, we find: # # 00.0000043a 4aaa 0004 TST.L 4(A2) # # that correponds to the test done by "if(a[1] == 0)". This is an error # since the array 'a' is just 16-12=4 bytes long. So a[1] points out of # the array! # **** APURIFY ERROR ! [$00245f1e(4) R-- (_read_shifted, PC=$0027ed9e HUNK=$0 OFFSET=$2be)] accessed across the ending boundary of: -2 [$00245f10(16) RW- (_main)] # # Hehe another error... Damn ! That test program is a FULL of bug ! # Yes, but that one is an other kind of error. It is an access across a # boundary. That occur in the read_shifted() code. We need not look in # the asm file to see the error. Here it is a misaligment error. # Visually that gives: # # ------------[ 1(]2 )-----------> # # [ 1 ] = allocated ( 2 ) = accessed. # **** APURIFY ERROR ! [$00245f1c(4) R-- (_read_long, PC=$0027edce HUNK=$0 OFFSET=$2ee)] accessed between: -162305 [$0026d920(96) RWS (segment Module CLI)] +2382621 [$00000000(1024) --S (Basic 680x0 vectors)] # # That error is strange! It is not an access to an array with a # negative index as one think immediately: We never call read_long() in # such a way, and the offsets are too big ! Indeed, the accessed memory # was right some times ago since it lays in the array 'a' (look at the # second hit). Hence, it must be an access to a free()'d memory. That # error is then obviously found in the code: # # free_arg(a); read_long(a). # ^^^^^^^^^^^^ # **** APURIFY ERROR ! [$00000004(4) R-- (_read_page_zero, PC=$0027ee32 HUNK=$0 OFFSET=$352)] accessed on a read-protected block: +4 [$00000000(1024) --S (Basic 680x0 vectors)] # # Here the error is obvious, were are reading the zero-page. If it was # in writing, that error would be very dangerous. # **** APURIFY WARNING ! Closing library without deallocation of the following block(s): - [$00271540(412) RW- (_main)] - [$00287070(12012) RW- (_main)] - [$0032e2c0(40012) RW- (_main)] # # The program has exit()ed. APurify tells us that we've forget to free # those blocks. It is a case of memory leak. Those blocks were # allocated in main(). Those were allocated and lost by # # a=malloc(4),malloc(400),malloc(12000),malloc(400000) # # since the assignment only affects the first item of ",,,". # **** APurify ended on Thu Jan 4 23:04:00 1996 # # Well... done :-). # LEGAL PART: ---------- That program is provided 'AS IS'. I am not responsible for any dammage it can cause (but I am responsible for the benefits it can give to you :-). Use that software at you own risks. That program is FREEWARE. You can use and distribute it as long as you keep the archive intact (no adulteration of files except for compression). It can't be sold without my agreement (except a minimal amount for media support). You must ask me for commercial use of (any part of) that product. I keep all my rights on that program and its future releases. I can modify that software without telling it to the users. If you wish, you can send me a postcard or anything else you want (money, documentation, amiga, hardware stuff, ...) in exchange for using APurify. But there is no obligation :-). My postal address is: M. DEVULDER Samuel 1, Rue du chateau 59380 STEENE FRANCE (yes I'm french !). You can send suggestions or bugs to my email address: devulder@info.unicaen.fr NOTES: ----- It has been compiled with cross-gcc 2.7.0 with libnix on a Sun sparc. I had the idea of that program after a chat with Cedric BEUST (AMIGA NEWS) on IRC (Internet Relay Chat). Thanks Cedric ! I wish to thank Philippe Brand for his help in my port. I also wish to thank J.C Hoehle for his usefull advices. All marks are proprietary of their respective owners. There are some programs like APurify. For example, FORTIFY (Simon P. Bullen), but it only detects illegal writes to boundaries of allocated blocks. Thus it can't detect big oversteps and oversteps in reading and the detection is not real-time. Enforcer can detect illegal access to memory, but it needs a special device (MMU). HINTS & TIPS: ------------ You can see some memory leaks with that version of APurify. It is not really good but it can help. Memory leak occur when a block of memory is nomore pointed by your program. Those memory blocks will necessary be displayed when your program exit()s. So with all the messages printed on that occasion, you can find such blocks. I known this is not so great, but I think it can help you a little bit (maybe in a future version I'll build some code to really check memory leaks). BUGS: ---- APurify don't known public memory where a program can read or write without having allocated it. Thus, it will report an error when a program reads or writes values in a message obtained through GetMsg() calls. Use AP_Report() to avoid such reports. It can display messages about closing the library without freeing some memory blocks. This is due to printf() that allocates memory that is free'd on exit. This is not a real bug, but you can avoid this by doing a AP_Report(0) just before exiting. But you must notice that it is better to display false bugs than to not display real ones. I've rewritten malloc()/realloc()/free(). I hope this will not produce bugs (I've tested sucessfully the test program with libnix and ixemul, so I hope it will be all right). Certainly more bugs, but I'm waiting for your bug-reports.